Remarks 

The Examiner is thanked for the courtesy shown during the telephone interview on February 
22, 2005. During the interview the Applicants' representative explained that the specification 
erroneously identified SEQ ID NO: 2 as the coding sequence for the dep protein. Similarly, SEQ ID 
No. 1 is erroneously described as containing the coding region. In fact, SEQ ID NOs: 1 and 2 
include the coding region for ydhC. As shown in Fig. 3 and discussed, for example, on page 9, lines 
13-23 and page 11, lines 21-25, ydhC is distinct from dep. 

To clarify, the dep protein has the amino acid sequence of SEQ LD NO: 3. Neither SEQ ID 
NO: 2 or SEQ ID NO: 1 include the coding sequence for SEQ ID NO: 3. A BLASTP alignment 
between SEQ ID NO: 3 and the amino acid sequence encoded by SEQ ID NO: 2 is enclosed. The 
alignment report indicates that "[n]o significant similarity was found". This amendment corrects the 
inadvertent error in the specification. 

The amendment introduces no new matter. The disclosure of the appropriate amino acid 
sequence (SEQ ED NO: 3) of a protein is adequate written description and renders definite claims 
directed to a gene that encodes the protein. In re Wallach, 71 U.S.P.Q. 2d 1939, (Fed Cir 2004). 

It is the Applicants understanding that the previous rejections under 35 U.S.C. §§ 102 and 
103 were based on the false premise that the protein of SEQ ID NO: 3 was encoded by the ydhC 
gene. Because the error in the specification and resultant misunderstanding have been corrected, it is 
requested that the rejections based on the ydhC gene be reconsidered and withdrawn. 

The ydhC gene is identified in Gen Bank record Accession No. AE000261, which was 
purportedly submitted by Blattner in connection with the 1997 Science article cited in previous 
Official Actions. It is the Applicants understanding that the Examiner has taken the position that a 

6 



plasmid having the complete coding sequence for ydhC was inherently disclosed in the reference 
because Blattner allegedly necessarily created such a plasmid in the course of sequencing the 
complete genome of E coli. 

It is hereby noted that SEQ ID NO: 3 is identical to the amino acid sequence described as the 
translation of gene bl657 in Gen Bank record Accession No. AE000261 . The product of bl657 is 
merely described as a putative transport protein. It is respectfully submitted that Blattner (in 
combination with Accession No. AE000261) neither inherently discloses, nor renders obvious, a 
plasmid having a gene that codes for the dep protein. 

Blattner does not inherently disclose a plasmid having a gene encoding dep 

In order for a reference to inherently anticipate a claim element, the element must necessarily 
be present in the reference. MPEP § 21 12. The fact that a certain condition may be present in the 
prior art is insufficient to establish inherency. In re Oelrich, 212 USPQ 323, 326 (CCPA 1981). 

The activities of Blattner did not necessarily create a plasmid that includes a gene encoding 
the dep protein. Although Blattner identified b 165 7 as a gene coding for the amino acid sequence of 
SEQ ID NO: 3, it is respectfully submitted that Blattner did not necessarily create a plasmid 
containing the complete sequence of bl657. According to the record for Accession No. AE000261, 
the bl 657 gene is located between sodB and purR, which are found between positions 1 ,700,000 and 
1,800,000 on the E. coli genome {see the map foldout of Blattner). According to Blattner (page 
1453, column 3, lines 24), this region of the genome was sequenced using the Ml 3 Janus shotgun 
strategy. This method is described by Burland, et al (1993) Genome sequencing on both strands: the 
Janus strategy, Nucleic Acids Res 21(15):3385. The Janus strategy article was coauthored by 
Frederick Blattner and a copy is enclosed. 

7 



According to the Janus strategy, samples are sonicated for 50 seconds to randomly shear 
DNA into fragments of from a few hundred base pairs to several kilobase pairs. The fragments are 
size-fractionated, and fragments of 0.7-2 kb are collected and ligated into the vector for sequencing. 
The majority of the fragments collected are length 1-2 kb. See, page 3386, column 1, paragraphs b 
and c of Burland, et al. 

According to the record for Accession No. AE000261, bl657 is 1 170 bp (about 1.2 kb) in 
length. In the process of randomly creating 1-2 kb fragments by sonification, there is a significant 
chance that no single fragment would contain any particular 1 .2 kb segment. In fact, it is not even 
possible for a segment of this size to fit into fragments at the lower end of the range of fragments 
selected and cloned in the Janus shotgun method. Therefore, there is a significant chance that the 
Janus shotgun strategy used by Blattner produced no random fragment that contained the entire 1 1 70 
bp sequence of bl657. It follows that the Blattner sequencing method did not necessarily produce a 
plasmid having the entire sequence of bl657. Therefore, the Applicants respectfully submit that 
Blattner did not necessarily create a plasmid that would have the entire bl 65 7 sequence or otherwise 
encode for a protein that confers DHCP resistance. The mere possibility that such a gene might have 
been created is not adequate to support an inherency rejection. See Oelrich at 326. 

Blattner does not render obvious a plasmid having a gene encoding dep 

It would also not have been obvious to create a plasmid that includes a gene encoding for the 
DHCP resistance gene, dep, at the time the invention was made. A proper obviousness analysis 
requires the difficult but critical step of casting the mind back to the time the invention was made. In 
reDembiczak,50\JS?Q2d 1614, 1616-17 (Fed Cir 1999). At that time, M657 was merely identified 
as a putative transport protein, one of 146 such proteins identified by Blattner. Blattner provides no 

8 



evidence that bl 65 7 is expressed in E. coli, no reliable basis to believe that it encodes a transporter if 
it is expressed, no description of what bl657 might transport if it is a transporter, and certainly no 
suggestion that bl657 would be useful for conveying resistance to DHCP. 

At best, the disclosure of bl657 as a putative transport protein could be characterized as 
general guidance to perform further research on the gene. This suggestion establishes a classic 
"obvious to try" situation. In such cases, a general disclosure might "pique a scientist's curiosity, 
such that further investigation might be done as a result of the disclosure, but the disclosure itself 
does not contain a sufficient teaching of how to obtain the desired result , or that the claimed result 
would be obtained if certain directions were pursued." In re Eli Lilly & Co. ,14 USPQ2d 1 741 , 1 743 
(Fed Cir 1 990). Just as discussed in Eli Lilly, given the description of "putative transport protein" a 
scientist might be inclined to further research the role of b!657. It is a long-standing rule that this 
type of "obvious to try" scenario does not render the invention obvious. In re Deuel, 34 USPQ2d 
1210, 12616 (Fed Cir 1995). 

Blattner does not contain any teaching as to how to find a gene that confers DHCP resistance. 
In addition, there is no suggestion that a gene conferring DHCP resistance could be found if known 
techniques of molecular biology were employed. Finally, Blattner does not disclose or suggest that 
b 1657 'is a DHCP resistance gene. In other words, based on the teachings of Blattner, one skilled in 
the art would have no reasonable expectation of success in creating a plasmid that encodes for a 
DHCP efflux protein based on the mere description that b!657 is a putative transport protein. 

A similar issue was decided by the Board of Patent Appeals and Interferences in Ex Parte 
Obukowicz 27 USPQ2d 1063 (Bd Pat App & Int 1992). In that case, the applicant was claiming a 
method of combating plant insect pests by applying a plant-colonizing bacteria having within its 

9 



chromosome heterologous DNA encoding for the protein toxin of Bacillus thuringiensis. The 
examiner issued a rejection based on a primary reference that described incorporation of the claimed 
gene into plasmids of various bacteria. The primary reference further suggested incorporation of the 
gene into the plant itself to create a systemic toxin or into specific bacteria that have superior 
survival characteristics. 

In reversing the examiner's rejection under 35 USC § 103, the Board characterized these 
statements in the reference as "an invitation to scientists to explore a new technology that seems a 
promising field of experimentation" and "of the type that gives only general guidance". Further, the 
Board reiterated the long standing rule that these types of general suggestions may make an approach 
"obvious to try", but do not make the invention obvious. 27 USPQ2d at 1065. 

Further, the Board observed that given the teachings of the applicants' specification regarding 
incorporation of the gene into the bacterial chromosome and using the bacteria in the plant 
environment, one can explain the rationale for the invention using selected teachings from the prior 
art. This approach, however, was recognized as impermissible hindsight. Id, 

The present situation falls squarely into the same category as the "obvious to try" cases. At 
best, Blattner provides general guidance on genes in the E coli genome, and possibly even a 
motivation to scientists to perform further research on the 146 putative transport proteins disclosed 
therein. However, there is nothing in the reference that give one skilled in the art a reasonable 
expectation of success in creating a plasmid that contains a gene encoding a DHCP efflux protein. 
Therefore, the invention is non-obvious over Blattner. 

Claim Rejections under 35 U.S.C. § 112 f 2 

Claims 16-18 have been rej ected as allegedly indefinite based on the term "disrupted" as used 

10 



in connection with the pur, ydhC oxydhB genes. Claim 17, relating to a disrupted ydhC gene, has 
been cancelled. The term "disrupted" has been clarified in claims 16 and 18 by reciting the 
restriction enzymes that can be used to "disrupt" the purR mdydhB genes, respectively. One skilled 
in the art would readily be able to determine the relevant restriction sites (Mlul and Nrul or Eco47in, 
respectively) within the each gene. Given the new recitations, one skilled in the art would be able to 
determine the sites at which the genes at issue would be cleaved or "disrupted" without undue 
experimentation. As such, the recitations are clear and definite. 

Claims 16-19 have been objected to based on the terms "pur gene", "ydhC gene" and "ydhB 
gene". Specifically, the Official Action indicates that the nature of these genes is unclear. The 
recitation of "pur" has been amended to "purR" in claims 1 6 and 1 9. With respect to all three terms, 
a claim term is definite if one skilled in the art would understand the scope of the claim when read in 
light of the specification. Union Pacific Resources Co. v. Chesapeake Energy Corp., 57 USPQ2d 
1293 (Fed Cir 2001). The specification clearly describes the genes in question at page 1 1, lines 21- 
25. PurR encodes purine synthesis repressor. YdhB encodes a homologue of the cyn operon 
transcriptional activator. The ydhC gene encodes a homologue of bicylcomymycin. As such, the 
applicants respectfully submit that one skilled in the art would understand the nature of these genes 
in light of the specification. 

Claim Rejections under 35 U.S.C. $ 112 f 1 - Written Description 

Claims 16-19 have been rejected as allegedly failing to meet the written description 
requirement. In this respect, the Official Action indicates that the specification provides an adequate 
written description for the genus of plasmids encoding dep, but fails to envision certain subgenera 
defined by the absence of specific genes or the inclusion of disrupted forms of those genes. 

11 



The objection to claim 16 specifically relates to disrupted forms of the purR gene. As noted 
above, this claim has been amended to further define the subgenus to include the purR gene 
disrupted by the restriction enzyme Mlul. It is respectfully submitted that the specification envisions 
the full scope of this subgenus. 

The objection to claim 18 relates to disrupted forms of the ydhB gene. This claim has been 
amended to recite that the ydhB gene disrupted by Nrul or Eco47m. The specification envisions the 
full scope of this subgenus as well. 

Claim 19 defines a subgenus based on its independence from purR.ydhC mdydhB. Claim 
1 9 has been subordinated to depend from new independent claim 2 1 (as have claims 1 6 and 18). As 
noted in the Official Action, the application includes support for the subject matter of the new 
independent claim (the genus of plasmids having a gene that encodes dep). As a dependent claim, 
claim 19 properly further limits claim 21. Moreover, the narrower scope of claim 19 is fully 
supported by the specification, such as in the embodiment of pSP007, into which dep was cloned 
without purR, ydhC and ydhB, Because the cloning of genes into plasmids is routine in the art, it is 
respectfully submitted that no further examples are necessary to describe the subgenus of dependent 
claim 19. 

Claim Rejections under 35 U.S.C. § 112 f 1 - Enablement 

Claim 20 has been rejected as allegedly lacking an enabling disclosure. Specifically, it is 
indicated that one skilled in the art must use pSP007 to practice the invention. As indicated in the 
Applicants' response of May 11, 2004, pSP007 was deposited with the American Type Culture 
Collection, 10801 University Boulevard, Manassas, Virginia 20 11 0-2209, on September 5, 2001 and 
assigned Accession No. PTA-3682. The specification has been amended to reference the deposit in 

12 



compliance with 37 C.F.R. § 1.809(d). Further, it is hereby certified that all restrictions on 
accessibility to pSP007, on deposit with the American Type Culture Collection under Accession No. 
PTA-3682, will be irrevocably removed by the Applicants upon the granting of a patent. 

New claims 

New claims 21-25 are directed to plasmids and isolated nucleic acid molecules comprising 
the dep gene. These claims are patentable for the reasons set forth above, i.e. 9 they are neither 
inherently anticipated nor rendered obvious by Blattner. Support for the new claims can be found 
throughout the specification, for example, at page 11, line 26 through page 12, line 8. 

New method claims 26 and 27 have been added to the application. These claims are directed 
to methods of using plasmids of the invention in anticipation of rejoinder once claims directed to the 
plasmids are allowed. Support for the method claims can be found, for example, in original claim 14 
and on page 8 of the specification 

Conclusion 

The application has been amended to correctly reflect that SEQ ID NOs: 1 and 2 include the 
coding sequence ioxydhC and not for dep. Therefore, it is respectfully requested that the rejections 
under 35 U.S.C. § 103 based on Blattner be reconsidered and withdrawn. It is further requested that 
similar rej ections not be established based on bl 65 7 because Blattner neither inherently discloses nor 
renders obvious a plasmid having a gene that encodes for a DHCP efflux protein. It is also requested 
that the rejections and objections under 35 U.S.C. § 1 12 be withdrawn for the reason set forth above. 



13 



It is respectfully submitted that the entire application is now in condition for allowance, 
which action is respectfully requested. If the Examiner believes that minor amendments or attention 
to other matters of form will advance the case, the Examiner is invited to telephone the Applicants' 
undersigned representative. 



Respeptfully subnjitted, 

o^J?^ ^^^^ 

Paul Carango Q 
Reg. No. 42,386 
Attorney for Applicants 



PC:SAN:vbm 
(215) 656-3381 



14 



Blast Result 



Page 1 of 1 



Blast 2 Sequences results 

PubMed Entrez BLAST OMIM Taxonomy Structure 

BLAST 2 SEQUENCES RESULTS VERSION BLASTP 2.2.10 [Oct-19-2004] 



Matrix BLOSUM62 g|| gap open: 1 1 gap extension: 1 
x_dropoff: 50 | expect 



10.0001 



wordsize: 



! Filter H 



Sequence 1 lcl|seq_l Length 389 

Sequence 2 lcl|seq_2 Length 403 
No significant similarity was found 



CPU time: 

Lambda 
0.326 

Gapped 
Lambda 
0.267 



K 



0.02 user sees. 
H 



0.138 



K H 
0.0410 



0.399 



0.140 



0.01 sys. sees 



0.03 total sees. 



Matrix: BLOSUT462 

Gap Penalties: Existence: 11, Extension: 1 

Number of Sequences : 1 

Number of Hits to DB: 1111 

Number of extensions: 740 

Number of successful extensions: 1 

Number of sequences better than 10.0: 0 

Number of HSP's better than 10.0 without gapping: 0 

Number of HSP ' s gapped : 1 

Number of HSP's successfully gapped: 0 

Number of extra gapped extensions for HSPs above 10.0: 

Length of query: 3 89 

Length of database: 791,805,785 

Length adjustment: 132 

Effective length of query: 257 

Effective length of database: 791,805,653 

Effective search space: 203494052821 

Effective search space used: 203494052821 

Neighboring words threshold: 9 

Window for multiple hits: 0 

XI: 15 (7.0 bits) 

X2: 129 (49.7 bits) 

X3: 129 (49.7 bits) 

SI: 40 (21.7 bits) 

S2: 77 (34.3 bits) 



http://www.ncbi.nlm.r^ 



3/2/2005 



iaivaka uiu latent Dept. -» PIPER RUDNICK ©004 



© 1993 Oxford University Press 



Nucleic Acids Research, 1993. Vol. 21, No. 15 3385-3390 



Genome sequencing on both strands: the Janus strategy 

Valerie Burland, Donna LDaniels, Guy Plunkett, III and Frederick R.BIattner 

Laboratory of Genetics, University of Wisconsin, 445 Henry Mall, Madison, Wl 53706, USA 

Received May 8, 1993; Revised and Accepted June 15, 1993 



ABSTRACT 

The design of large scale DMA sequencing projects 
such as genome analysis demands a new approach to 
sequencing strategy, since neither a purely random nor 
a purely directed method Is satisfactory. We have 
developed a strategy that combines these two methods 
In a way that preserves the advantages of both while 
avoiding their particular limitations. Computer 
simulations showed that a epecffic balance of random 
and directed sequencing was required for the most 
efficient strategy, termed the Janus strategy, which has 
been used In the Escherichia coll genome sequencing 
project. This approach depended on obtaining 
sequence easily from either strand of a cloned insert, 
and was facilitated by Inversion of the Insert in the 
engineered Ml 3 vector Janus, by site-specific 
recombination. The Inversion was accomplished simply 
by growth on the appropriate host strain, when the DNA 
strand incorporated Into the new single stranded phage 
was complementary to that In the original phage, and 
was sequenced by the same simple protocol as the flret 
strand. 

INTRODUCTION 

Large scale DNA sequencing projects such as whole genome 
analysis demand special attention to strategy: which is more 
efficient, to sequence random clones from a library or to sequence 
specific clones from an ordered set in a directed manner? The 
scale of genome projects also demands consideration of technical 
simplicity (suitability for automation), informatics requirements 
and cost, all vital in designing the most efficient system. 
Construction of random libraries allows sequencing to proceed 
without prior finely detailed mapping or specific subdoning steps, 
tot a high order of redundancy is necessary for a randomly 
sequenced project to be assembled with no specific 'finishing* 
Steps. Different directed strategies, on the other hand, yield 
sequences which fit economically together in a predetermined 
fashion, but require a significant level of analysis or design in 
the form of mapping work to identify a set of overlapping 
templat e subclones, or in ^signing new primers for sequencing 
and DNA amplification in the case of primer 4 walking*. Clearly 
both these strategies have valuable features. In the Escherichia 
coli genome project (1, 2) we have combined the particular 
benefits of each, using random sequencing to collect the bulk 
of the data initially, followed by a simplified directed strategy 
that uses no new clones or technical procedures, to complete the 
sequence. Computer simulations were used to investigate the 



effects of different combinations of random and directed data 
collection on cost and efficiency, and the results indicated the 
optimimum scheme. Seamless integration of the two strategies 
was facilitated by a bacteriophage M13 vector, Janus, engineered 
to allow sequencing of both strands from single stranded 
templates. A single cloning process yields Janus library clones 
whose inserts may be inverted simply by propagation on a host 
expressing a site specific recornbinase, Int from phage lambda 
(3). The packaged genome then incorporates the second strand 
ofthe insert, and die same simple protocol may be used for 
template preparation. Sequencing random Janus clones followed 
by selection of specific clones for sequencing after inversion leads 
to a particularly cost-effective strategy striking a balance between 
random and directed approaches. 

MATERIALS AND METHODS 

Recombinant DNA *«**"rtipiffi 

Standard methods (4) were used for the construction of Janus 
with few modifications. Enzymes were used according to the 
m a nufac turers instructions and were obtained from Boehringer- 
Mannheim (restriction enzymes), United States Biochemical 
(Sequenase, Mung bean nuclease). New England Biolabs (T4 
DNA ligase), and Stratagene (alkaline phosphatase). The attP 
fragment was obtained from pPH54 (5) provided by A.Landy. 
TTie fragment was isolated by gel electrophoresis through 2% 
Low Melting Point agarose (BRL) in Tris-acetate buffer, and was 
eluted using Geneclean (BiolOl Inc.). Bacterial strains were 
obtained from New England Biolabs (JM101) and Stratagene 
(XLl-blue). 

Oligonucleotides 

The attB oligonucleotide strands (sequences are shown in 
Fig. 3 A) and special sequencing primers were synthesised by the 
Beckman Research Institute ofthe City of Hope. A primer was 
designed for ease of sequencing across the oflfipartof the Janus 
candidates to confirm the structure of both the original and 
inverted forms, and also to analyse false positives in the libraries. 
This primer hybridised within the lacZ gene about 100 bases 3' 
of the cloning site; the primer sequence is 5' CCTCTTOGC- 
TATTACGCC 3'. 

Construction of random libraries 

a) Growth and preparation of DNA for cloning. For large DNA 
preparations, Janus was grown in NZY medium on host JM101; 
phage were collected from 2 liter culture supernatants by 
precipitation with 4% PEG. O.SMNaCl, and banded on cesium 



^uuo vo/va io;oo paa u// oqj /^aa 



TAKAKA BIO Patent Dept. PIPER RUDNICK @|005 



3386 Nucleic Acids Research, 1993, Vol 21, No. IS 

chloride gradients. After dialysis to remove CsCl, DNA was 
released by phenol extraction and then dialysed extensively . The 
double stranded implicative form of Janus was prepared as a 
plasmid by alkaline lysis (4) and purified by cesium chloride 
density gradient centrifugation. Lambda clones were grown in 
2 liters of NZC, using host strain LE392, and MOI= 0.01 (6). 
Phage were collected by PEG precipitation and purified by cesium 
chloride banding; DNA was prepared from the phage as for single 
strand Janus. 

b) Sqnicanon and size fractionation of target DNA. 5/tg of lambda 
clone DNA were adjusted to 20/d with TE buffer in 0.5ml 
microfuge tubes. Hie samples were sau0^ 

using a cup-horn probe, with fresh ice ^Sid water 3 the same 
level for each sample. Power was delivered by a Branson Sonifier 
(Cell Disrupter W-350) set at 17% of duty cycle and output 
control at l t giving an output of 200W. Tb^ settings ik^^ 
sheared all of the phage DNA* giving a size range Iran 2-300 
hundred basej^ (bp) © about 10 Uobasepairs (kb), evenly 
distributed. TTie sonicated DNA was then repaired by digestion 
with 15 units of Mung bean nuclease, at room temperature for 
10 minutes. The sample was then immediately loaded onto a 1 % 
Seaplaque GTG agarose gel (FMC BioProducts) in Tris-acetate 
buffer and electrophoresed for 25 minutes at 25 volts . The parts 
of the gel containing the marker lanes were cut off and stained 
with edndium bromide. The gel was then reassembled and the 
0.7-2 kb sonicated DNA cut out and eluted by Genedean 
(BiolOl Inc.), Mto 20 TE buffer; the yield w 
Samples in which sonication produced an uneven distribution of 
material in the 0.7—2.0 kb range were not used as they often 
contain an excess of very small fragments which done more 
efficiently than the 1—2 kb pieces desired. 

c) Cloning with Janus. In the Escherichia coti genome project 
(1, 2) Janus was used to make subclone libraries starting from 
E. coli clones in phage lambda (7). The following procedure was 
adapted from Bankier et at. (8). The vector was prepared fay 
digestion with Sma I and treated with alkaline phosphatase to 
prevent iedrcntoriratian. Target DNA was prepared from lambda 
dones by sonication to produce fragments ranging in size from 
0.5 to several kb. Target DNA ends were repaired by digestion 
of single strand extensions with Mung Bean nuclease; otter 
enzymes such as Klenow polymerase or T4 polymerase, or 
combinations of these, all give similar results. After end repair, 
the target DNA was size-fractionated by agarose gel 
electrophoresis, and fragments from 0.7 to 2kb were collected. 
Purified fragments were ligated into the prepared vector and 
transformed into XLl-blne competent cells prepared by the 
method of Hanahan (protocol 2 in reference 9). When these 
transformed bacteria were plated, JM101 was used to provide 
the bacterial lawn. This combination of host strains permitted 
the maximum efficiency of transformation and good growth of 
plaques on agar, both of which are vital for optimal recovery 
of library clones and fast throughput in a large project. Clones 
containing lambda DNA inserts were identified and discarded 
before sequencing. 

To test the quality of prepared vector batches and monitor the 
cloning process, a preparation of lambda DNA was partially 
digested with Alu I. After fractionation to prepare the same size 
range as the sonicated targets, it was ligated into the vector 
sample. Since Alu I leaves brunt ends, no repair step is necessary 



and the cloning efficiency is at least 10 fold higher than that of 
the end-repairecl/iarget. 

The yield of clones (colorless plaques) was between 1000 and 
3000 per lOOng prepared target DNA. In comparison with 
enzymaticaUy digested DNA, mechanically sheared DNA dones 
inefficiently because many of the ends are unrepairable and 
fragments may also be internally damaged. However, all the 
restriction enzymes we tested singly or in combination (10) did 
not produce truly random libraries and mechanical methods are 
preferred. In the Janus libraries the ratio of dones to vector 
(colorless to blue plaques) was about 2 or 3 to 1, and the 
background of false positives (colorless plaques from a vector- 
only ligation) was 2 or 3 per 100 blue. These blue plaques derive 
from uncut or unphosphatased molecules and the few false 
positives result from loss of bases at the Sma I site. 

Screening for lambda in the Janus libraries 

Library plaques were toothpicked into 100/d TE buffer in 
microliter dish wells. Each dish was then replkated onto a lawn 
of bacteria (JM101) in top agar and incubated 18 hours at 37°, 
producing the microliter dish array of phage samples as dear 
patches of lysis. The set of patches were lifted onto Nytran 
membranes (Schleicher and Scfrudl), denatured and neutralized, 
then baked at 68° for ait least 2 hours. The membranes were 
hybridized to a lambda probe made by labeiDii^ Hind m<ut DNA 
by nick translation with 32 P. Standard procedures were followed 
(4) . The membranes were exposed to Kodak XAR5 film and the 
hybridizing patches readily identified; these are marked in the 
microliter dish wells with food color and die r emaining samples 
transtered to a new dish. After they have been used to inoculate 
mini-cultures for sequencing, the phage samples are solidified 
by addition of 75 fd of melted 0.456 agarose in TE buffer, cooled, 
sealed and stored at 4°. 

DNA sequencing 

For the large scale sequencing project, Janus library clones were 
grown and DNA template preparation carried out in microliter 
dishes (11) for random sequencing. To analyse candidates for 
the Janus construct, templates prepared from 1ml cultures were 
extracted and purified in microfuge tubes (12). DNA sequencing 
was by the chain termination method of Sanger (13) and was 
performed by an automated system, using internal ^S labd, 
acrylamide gd dectrophoresis and autoradiography (14). 
Reagents and Sequenase were obtained from United States 
Biochemicals. 

Inversion of Janus inserts 

The inverting host strain was constructed by transforming JM101 
with the pdasmid pHS3-l provided by J. Gardner (15), containing 
the cloned lambda int gene fused to the promoter. The 
transformed strain, called FB1898, was maintained on minimal 
agar with ampicillin ( 100ftg/ml) . For inversion of selected Janus 
clones on this host, bacteria were grown in NZY broth containing 
ampicillin, to mid-exponential phase, then 2ml portions were 
inoculated with the Janus samples. IPTG was added to O.lmM 
and the cultures incubated at 37° with shaking for 6 hours to 
give titers similar to those obtained on JM101 without the 
plasmid. Phage was harvested and DNA templates prepared as 
for nonirrverted clones . In tests of Janus plaque color, spontaneous 
inversions and reversions were not detected; the frequencies of 
both were less than 1 in 10 s . 



2005 03/06* 18:57 FAX 077 543 7295 



TAKARA BIO Patent Dept. -> PIPER RUDNICK 



lg]006 



Software 

Computer programs for data collection* sequence assembly, and 
sequence analysis (1) were from DNASTAR packages. The 
program for the strategy simulations was specially written. 

RESULTS 

Strategy of sequencing with Janus 
For each lambda clone, sequences were collected from the 
random Janus library until sufficient data were obtained to allow 
assembly (alignment) of sequences into contigs of several kb in 
length. For a library made from a 20 kb lambda clone, this 
number is about 500 sequences depending on the length of 
sequence readouts and data quality. These data assembled into 
3—4 contigs having an average coverage of 6— 8x f with some 
areas of only 1 or 2x coverage and other areas with data from 
only one strand. To address all these problem areas, specific Janus 
clones were selected by inspection of the alignment. Taking into 
account the average insert length, clones were chosen so that after 
inversion, new sequences collected from the opposite strand and 
the opposite end of the inverted clones would cover the problem 
areas . The new data extended the ends of contigs enabling them 
to merge, as well as improving coverage to meet die minimum 
standard of at least four determinations at every point and at least 
one on each strand, essential for the level of accuracy needed 
to locate authentic open reading frames and other features.This 
strategy has been used in the E.coli genome project and is 
illustrated in Fig. 1, a detailed example from one of the sequenced 



A 




Figure 1* SftQUfflcinfl strategy diagram generated by Somnan alignment computer 
program, edited for clarity. Data from me KcoU genome project (2); sequenced 
lambda done Ee 27-236 (at 833 minutes on the physical map), a small portion 
of the lambda project is shown. Numbers represent Janus clone names. 
A: alignment of sequences coHected in the random phase of the project, boxed 
areas are poorly covered or have data from one strand only. Clones 16, 230, 
250 and 1125 were selected to provide arktinVmal dam by inversion, and sequence 
was gathered from the op posit e end for the complementary strand as described. 
B: alignment with the new wtpimnrs added (hatched arrows), showing improved 
coverage and second sttnud data for most of the problem areas. Hie two acojuences 
from before and after inversion may overlap rirpmrimg on the done length and 
the length of the sequence. Data overlapping the tafo of the rcgto 
removed for clarity. 



Nucleic Acids Research, 1993, Vol. 21, No. 15 3387 

lambda clones (2) where sequencing inverted clones improved 
a poorly covered area. 

Simulations of combined strategies 

To assess the optimum mix of random and inverted sequencing, 
a theoretical analysis was carried out. The effect of switching 
at different times from random data collection to selection of 
specific clones for inversion was analysed using a computer 
program to compare different finishing methods for simulated 
assemblies of random sequences. The simulation calculated the 
cost of finishing a 20 kb project by varied amounts of random 
sequencing followed by primer walking or Janus inversions to 
finish. Collection and assembly of raw data were determined by 
random number generation with assumed values for data quality, 
sequence readout length and cost of each operation. The minimum 
criteria for coverage require at least two determinations on each 
strand at every point. The number of finishing steps required 
(gaps or thin areas) was captured at different times during the 
simulated assembly. The curves is Fig. 2 show the cost of a 
sequencing project using different strategies: cost was plotted as 
a function of the amount of random sequence collected before 
finishing fay a directed method. The upper curve represents 
finishing solely with primer walking steps, and die lower curve 
shows finishing with as many Janus inversions (flips) as possible 
followed by the much smaller number of primer walking steps 
needed. In this case, after collecting random sequences to each 
fold of coverage, the computer examined the assembly for places 
where Janus clones may be flipped and sequenced from the 
second strand to advantage. Finally a few primers were used to 
dose the sequence. In either case die result was coverage at least 
twice on each strand at every point. 




fold coverage 



Figure 2. Comparison of costs of sequencing strategies with and without Janus 
iirverskHK (flips) by compu^ simulated 
assemblies. Strategics using different enmrrf nations of primer walking and random 
sequence collection with no inversions (open squares), upper curve; with inversions, 
(filled diamonds), lower curve. Cost per finished base ($) was plotted against 
avenge number of deterrnmations at each point by random data collection (fold 
coverage). Cost per finished base mckided the fixed costs of labor and materials 
in the current project for trsnptafe preparations, sequencing reactions, gels and 
autoradiography but did not include management, research or other overhead. 
Assumed values are: cost per raw base, 2j43 cerjto; ti^costperftd CM seqiaences^ 
at SS * yield and 400 bases per readout. $200; cost of primer for each walking 
step. $30; primer arjecmg 200 hasea; average length of insert in Janus done*, I kb. 



^uua uo/uo io;d/ r/VA u// 040 /zyo 



3388 Nucleic Acids Research, 1993, Vol 21, No. 15 

The zero fold point (no random sequencing) is a pure primer 
walking strategy. At 5 fold coverage without flips, the number 
of primers is reduced mote than 90% and the cost to finish more 
than halved. At 19 fold coverage, no primers are required and 
the strategy is pure random. The minimum cost for finishing by 
primer walking is at 8-9 fold random coverage, when 30-40 
primers are needed. Using the Janus strategy, the minimal cost 
is between 5 and 7 fold random coverage. At 7 fold, 61 very 
cheap inversion steps reduce the need for new primers from 58 
(without flips) to 6. These data are summarized in Table 1. Thus 
the point to switch to flipping is at 5 to 7 fold coverage for 
optimum efficiency, achieved with a 25% saving in cost. Above 
10 fold die costs are equivalent and increase linearly as the burden 
of obtaining very large quantities of data outweighs the smaller 
number of finishing steps needed. 

We also tested a number of other assumptions. A fourfold 
reduction in die cost of primers reduced the cost of the pure 
primer walking strategy by half, whereas a tenfold drop in primer 
cost produced a saving of two thirds. At 100 fold reduction in 
primer costs, such as might be obtained by the use of pentamer, 
hexamer or nonamer libraries (16, 17, 18), the cost per base for 
pure primer walking was reduced four fold. This does not take 
into account the high initial cost or the data management 
challenge. As primer costs become negligible there is little 
difference between the two mixed strategies by this simulation. 
At this point other considerations become more important. 

Design and construction of Janus 

M13 cloning vectors (19, 20) have the proven advantage of 
producing single stranded templates for primer extension (13) 
without competition from the complementary strand. Since M13 
grows without tysing the host bacteria, DNA may be isolated 
easily by harvesting bacteriophage from culture supernaiants, free 
from cell proteins, RNA or host genomic DNA. In addition the 
vector is distinguishable from phage containing cloned inserts 
by screening for plaque color (20). Janus was constructed from 
M13mpl9 so that these features were preserved. Obtaining 
sequence from the second strand of the cloned insert would 
normally entail recloning in the opposite orientation or extensive 
random sequencing of shotgun clones to obtain both strands by 
chance. The second strand could be obtained by enzymatic 
synthesis on the first strand *Mnpi«t» (21), or by purification of 
the double stranded replkative form from cell lysates, procedures 
which lose the advantage of the simple preparation protocol for 
single strand DNA. By engineering a recombination system to 
invert the insert during growth, the second strand is made 
available far packaging and preparation as single strand template. 

The site-specific recombination system for lambda integration 
into die E.coli genome was chosen for its efficiency and because 



Tutte 1. Summary of cost comparisons for different strategies 



Strategy 


Cost" 


Random 
coverage 


Number of 
Primers 


Total 
coverage 


Primer 










wafting only 


0.78 


0 


400 


4 


Random only 


0.43 


19 


0 


19 


Mixed 1 * 


0.26 


8-9 


29-42 


9.2 


Janus 6 


0.19 


6-7 


6-13 


8.1 



• $ per finished base 

b Random phis primer walking 

c Random phis Janus inversions phis primers 



it is well understood. The lambda and Kcoli an sites by which 
the phage integr^s into the genome (attP and attB respectively) 
have been analyzed in detail (reviewed in 3) and the functional 
elements defined. Both an sites consists of a 15 basepair common 
core where the staggered cut sites for the Int recombinase are 
located. The attB site has a few bases on either side of the core 
which are essential for activity (22). Use anP site is much larger, 
with essential flanking sequences of 160 and 82 basepairs, 

containing hmHing sites for their* nynmhinfl^ and THp pmtein 

(5). Lambda integration is a reversible event but a second lambda- 
coded gene product (Xis) is required for excision. This gene is 
not present in the Janus construct, thus only one recombination 
event can occur. The hybrid an sites resulting from inversion 
are not recognised by Int so the inversion product is stable. The 
design of Janus is shown in Fig.3A. 

To construct Janus, the two an sites were engineered into 
M13mpl9 with a unique cloning site between them. Their relative 
orientations were designed to invert rather than delete the cloned 
segment in an intramolecular recombination. Inversion is a site- 
specific recombination event and acts upon the double stranded 
DNA of the replkative form of the phage. The whole region 



A 




Figaro 3. Structure of Janus and inversion of Che cured insert. A) Diagram of 
thft vector showing the relative prvririrww e^nttH «vt **P c*flnrwn« w pyw4 r+w» fnrtf* 
cloning site Sma I. Sequence of the synthetic segment is shown, with the attB 
core in boldface type; H3X indicates the 5' extension designed to ligate to a cot 
Hind m end but not regenerate the Hind ED she. At the other end, RI indicates 
the Eco IU«complemeotary tnrtrmim, The restriction ate was regenerated but 
is not useful for cloning since it Is on the wrong side of attB to obtain T^rv* 
by inversion. The dashed arrow ii|atsum sequence obtained from the Universal 
primer. B) Inversion of the doned insmm the presence c€ Int feeutuu^ Before 
inversion sequence is obtained by extension of the Universal primer (dashed line 
above the upper section). The arrows in the insert box indicate the orientation 
of the insert with respect to the primer site. Upon inversion, recomhmation 
junctions are formed by the fused halves of attB and attP at each site. S ftq ufflcn 
is now ohtamnri by extension of the Reverse primer, jwdfrat^i by the dashed arrow. 



zuuo ua/us rAA uyy 043 yz»s 



TAKAKA BlU Fatent Dept. -> PIPER RUDNICK 



1^008 



between the att sites, including the cloned insert and the Reverse 
primer site, is inverted relative to die. origin of replication 
(Fig.3B) so that the strand now packaged into phage heads is 
that complementary to the packaged strand of the original clone. 
The Reverse primer site is also on the packaged strand of the 
inverted DNA and is appropriately placed for sequencing the 
inverted insert (Fig.3B). Reverse sequencing primers are 
commercially available. Molecules not inverted (about 3 percent 
measured by plaque color of the inverted vector) do not interfere 
with the sequencing reactions since these do not contain the 
Reverse priming site. 

Screening for plaque color in M13 depends on expression of 
the 0-galactosidase a-conrolementing segment coded by part of 
the tacZ gene. Insertion of foreign DNA into the cloning site 
located in the exon disrupts the gene and die enzyme is no longer 
produced (19). An attB segment was designed to read in frame 
after insertion into die fezcZ gene. The segment was synthesized 
with an adjacent Sma I site for library cloning (Fig.3A). Single 
strand extensions of Hind in and Eco RI restriction sites were 
used to ligate the segment between these sites in M13mpl9. The 
attP site was obtained as a 400 basepair Hind IQ-Bam HI 
fragment from plasmid pFH54, a subclone of the off region from 
lambda (5). The aids of the purified fragment were filled in using 
Klenow polymerase and ligated into the Ava II site (similarly 
filled in) in M13mpl9. Thus die two att sites were irtfrrtrew? 
into M13mpl9 with die unique Sma I site between them. The 
constructs ability to produce blue plaques on plates attaining 
X-gal was confirmed. Since die attP fragment was ligated by 
a blunt end reaction, it was important to determine the mentation 
of attP within the construct Each att insertion was identified by 
hybridization of candidate plaques to 32 P- end labelled or nick 
translated att probes. Candidates which formed blue plaques and 
hybridized with both probes were tested for the ability to invert 
by a functional assay. Colorless and blue plaques were counted 
after growth in the presence of induced Int recombtnase; inversion 
dissociates the lacL promoter from the coding sequence and 
prevents expression of 0-galactosidase. Correctly oriented 
constructs gave 100% blue plaques before inversion, and yielded 
more than 95% colorless plaques when plated 2 hours after Int 
induction in liquid culture. Hie structure of Janus was then 
confirmed by restriction site mapping and by sequencing across 
the attB segment in the uninverted and across the 3' recombination 
junction in the inverted form (Fig.3B). 

DISCUSSION 

Tackling the analysis of entire genomes has called for new ways 
of combining technically simple random sequencing with some 
level of directed approach for the finishing steps. In the Kcoli 
project, the key to integrating random and directed sequencing 
strategies was die ability to invert the insert within the vector. 
Obtaining sequence from the second strand of selected clones 
without having to make double stranded template was a <rigntfir»nt 
advantage and enabled the mixed strategy to be developed. Now 
project finishing is possible with a relatively small number of 
easy directed steps and no further mapping or subcloning, while 
the order of redundancy needed is much reduced. 

Purely directed strategies have the advantage that a much 
smaller amount of data is collected than in random prqj ects , and 
that the fit of each new sequence to the existing data is tailored. 
The success of most directed strategies however, depends on 
precise identification and analysis of the template clones before 



Nucleic Acids Research, 1993, Vol 21, No. IS 3389 

sequencing: for example, nested deletions (23) and sequencing 
from mapped transposons (24, 25) entail prior characterization 
of the clones. Although there are strategies that avoid the work 
of subcloning and mapping, such as 'multiplex* sequencing (26) 
or primer 'walking*, the design of oligo primers or probes from 
sequence at the end of a runout is a potentially limiting factor. 
Since new oligonucleotides must be designed and synthesized for 
each sequencing reaction, good design depends on having good 
data at the extreme end of the readout and long readouts are 
necessary to keep the number of primers to a minimum — both 
of course possible but difficult to sustain in a high throughput 
project Failed reactions must always be repeated since all the 
sequences in the strategy plan are necessary for completion. In 
addition many starting points must be used simultaneously for 
closure in a reasonable time, creating significant costs of sample 
tracking and record keeping. In walking strategies, multiple 
hexamer priming (16, 17, 18) may reduce primer costs 
significantly. Such methods may eventually be able to use 
genomic DNA directly as a template, though this has yet to be 
demonstrated even by 'cycle* sequencing with a thermostable 
polymerase (reusing the template in thermal cycles) (27). Thus 
either some subcloning (28) is required, or the target DNA must 
be amplified torn the genome the M 
(29) before sequencing. Finally, if these methods are used to 

produce sequence with only nrw> Hatermingrinin fflffr p^ffi ffrfy 
haoomft vulnerable tn mmnrifti^ tn thft Ump\ntm fttvi fn wpwnHng 

errors, not all of which are easily detected. 

The advantages of random sequencing are that no analysis of 
the template clones is necessary, that only one process is needed 
to collect the data, and that success does not depend on the success 
of any individual sequence reaction, but depends only die total 
amount of data collected. Informatics requirements are therefore 
modest. These factors become particularly important when a 
large-scale project is undertaken. However, purely random 
strategies demand prohibitively large amounts of data for 
completion and accuracy. For example, to obtain coverage of 
each base at least twice on each strand, average coverage must 
approach 20x, or 1760 clones for a 20 kb segment compared 
to fewer than 600 by the Janus stategy or 400 by primer walking 
only. Janus takes advantage of the conceptual simplicity of the 
purely random strategy but reduces the amount of effort involved; 
at the same time the finishing steps are acquired at nearly the 
same cost as random data. 

ACKNOWLEDGEMENTS 

This is paper 3348 from the Laboratory of Genetics. 
Development, construction and verification of Janus was 
supported by the Romnes Fund through the University of 
Wisconsin Graduate School. The E.coll Genome Project is 
supported by NIH award HG00301. We thank M.Schwid, B.Fritz 
and E.Sommers for technical assistance, J.Gardner and A.Landy 
for kindly providing piggmiHg and N.Peterson for administration. 
We also thank the members of the E.coli Genome project teams. 



REFERENCES 

1. Dameb.D.L., PhnArft UI.G.. Buriaad. V. and Bisttner.F.R. (1993) Science t 
257. 771-778. 

2. Budmd,V., Ptunkctt m,G., DtmdsJ) J- and BI*tmer,F.R. (1993) Genomics, 
16, 551-561. 

3. Landy,A. (1989) Arm. Rev. Biochem.. 58 t 913-949. 



2005 U3/08 18:58 KAA 077 543 72W5 



takaka uiu latent Dept. -» fifbK KUUNicrt tgjuoa 



3390 Nucleic Acids Research, 1993, Vol 21, No. 15 



4. Mamati&,T., Fiitsch,E.F. and SambrookJ. (1982) Molecular Cloning: A 
Laboratory Manual. Cold Spring Harbor Umveraky Press, Cold Spring 
Harbor. 

5. Hso,P.-L., Ross.W. and Landy^A. (1980) Nature* 285, 83-91. 

6. DunnXS. and Blattner.F.R. (1987) Nuddc Adds Res., 15, 2677-2698 

7. Danieb.D.L (1990) Id DrlicaJC. and R0ey,M. (eds.) Hie Bacterial 
Chromosome. American Society for Microbiology, Washington, D.C., pp. 
43 -52. 

8. Bankkr^T., Weston .K.M. and BarreH3.G. (1987) Methods in Enzymobgy. 
155, 51-93. 

9. HanahanJ). (1985) In CHover,D.M. (ed.) DNA ckming-A Practical 
Approach. IRL Press, Oxford. Vol.1, pp. 109-135. 

10. D.L. Daniels and V. Burland, nnp»KH«h*t data. 

11. <Sson,C.H.. Blattner.F.R.and DanieUJ).L. (1991) Methods, 3, 27-32. 

12. United States Biochemical Corporation, 1990. SrqurTwv protocol booklet. 

13. Sanger.F., Nidden.S. and Coubon^JL (1977) Prvc Nad. Acad ScL USA, 
74. 5463-5467. 

14. Daniels J>.L., MarxJL, Bromley ,RX. and Blattner,F.R. (1990) In 
SarmaJLH., and Sanna,M.H. (eds.). Structure and Methods. Adenine Press, 
Gaflderiand, NY, Vol.1, pp. 29-35. 

15. Lee,CL., Garnport^LX and GardnerJ.F. (1990) J. Bacterial. 172, 
1529-1538. 

16. KkkoawU., DunnJJ. and StafierJ.W. 0992) Science, 258, 1787-1791. 

17. Koder,L.V„ Zevin-Sonkin,D., SobolevJA, Beakin,A.D. and 
UlanovBky^E. (1993) Free. NatL Acad ScL USA, 90, 4241-4245. 

18. SiemieniakJ).R., and SUgjtfomJJL. (1990) Gene, 96, 121-124. 

19. Mesamg^., (hxmenbom.B. f MnDer-Hfll3. and Hc^scbneiderJP.H. (1977) 
Proc. Nad. Acad ScL USA, 74, 3642-3646. 

20. Yaiiiach-Fenon t C., VieiraJ. and Messing J. (1985) Gene, 33, 103-119. 

21. Smim,V.^nd Gheejtt. (1991) Nuddc Adds Res.. 19, 6957. 

22. Miamcm\M., and Nfiznuchi.K. (1985) Nuddc Adds Res., 13, 1193-1208. 

23. Bames,W.M., Bevm>f. and Son^P.H. (1983) Methods in En&mology 101, 
98-122. 

24. Adacta\T., f&asxsaM.. Rnhinann f R.A., AppcHa^., O'Day.M., Oetkrt>f. 
and Minracni,K. (1987) Nuddc Adds Res., 15,771-784. 

25. LmX., Whafco.W., Da^A. and Berg,CM. (1987) Nuddc Adds Res., 15, 
9461-9469. 

26. Ohara.0., DoriUtL.,and Gflbert,W. (1989) Proc. NatL Acad ScL USA, 
86, 6883-6887. 

27. Us, MJl., Myambo, O., Gdfimd, D.H. and Brow, M.AJ). (1988) Ftoe 
Ntoi 4ca£ Set USA, 85: 9436-9440. 

28. Shyamala,V.. and Ames,O.F. (1989) Gene, 84, 1-8. 

29. Strauss^C, Kobori^.A., Sm,G. and Hood,L.B. (19S6) Analytical Kochem., 
154, 353-360. 



